[fix](load) fix load_to_single_tablet routing for auto partition#64356
[fix](load) fix load_to_single_tablet routing for auto partition#64356sollhui wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
### What problem does this PR solve? Issue Number: None Related PR: apache#64356 Problem Summary: Resolve the PR merge conflict with current master while preserving the load_to_single_tablet auto partition load tablet index cache fix and the latest master cloud tablet location lookup logic. ### Release note None ### Check List (For Author) - Test: ./build-support/check-format.sh; ./run-fe-ut.sh --run org.apache.doris.transaction.AutoPartitionCacheManagerTest (failed before running the target test because fe-catalog does not compile on current workspace/master: Column.java TPatternType conversion and missing ScalarType variant methods) - Behavior changed: No - Does this need documentation: No
|
/review |
|
run buildall |
There was a problem hiding this comment.
I found a compile-blocking issue.
Critical checkpoint conclusions:
- Goal/test: The PR intends to preserve
load_tablet_idxfor runtime auto partitions whenload_to_single_tablet=true. The implementation shape addresses the right create/replace RPC and cache paths, and adds a cache unit test, but the FE code cannot compile in its current form. - Scope/focus: The change is small and focused on BE request propagation, FE response construction, and the auto-partition cache.
- Concurrency/lifecycle: The new cached primitive shares the existing
ConcurrentHashMapentry and transaction cleanup lifecycle; I did not find a new lock-order or lifecycle issue. - Config/compatibility: No config item is added. The Thrift request fields are optional with default false, so the protocol change is structurally compatible.
- Parallel paths: Both createPartition and replacePartition runtime paths are updated, and both BE writer paths use the shared
VRowDistributionrequest code. - Tests/results: A unit test covers
AutoPartitionCacheManagerreturning the cached index, but I did not run the FE test because the missing import is already a compile blocker. - Observability/performance/persistence: No additional issue found; the cache remains in-memory transaction-scoped state and does not add persistence requirements.
- User focus: No additional user-provided focus was supplied.
Please fix the missing import and rerun the relevant FE/BE checks.
### What problem does this PR solve? Issue Number: None Related PR: apache#64356 Problem Summary: Fix FE compile errors after resolving conflicts for the auto partition load tablet index change. FrontendServiceImpl now imports DistributionInfo explicitly, and the cached load tablet index path uses long/AtomicLong to match TOlapTablePartition.load_tablet_idx, which is thrift i64. ### Release note None ### Check List (For Author) - Test: git diff --check; ./run-fe-ut.sh --run org.apache.doris.transaction.AutoPartitionCacheManagerTest (blocked before running target test by existing fe-catalog compile errors in local workspace: Column.java TPatternType conversion and missing ScalarType variant methods) - Behavior changed: No - Does this need documentation: No
0332b6e to
1b03e7e
Compare
|
run buildall |
|
/review |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
There was a problem hiding this comment.
Review result: no additional blocking issues found in this pass. The existing inline thread about the missing DistributionInfo import is already-known review context, and the current head includes that import, so I did not duplicate it.
Critical checkpoint conclusions:
- Goal and proof: The PR passes load_to_single_tablet from BE auto-partition RPCs and persists the selected loadTabletIdx in AutoPartitionCacheManager so concurrent create/replace calls for the same txn/partition reuse the same tablet index. The added unit test covers cache reuse of the index; no end-to-end auto-partition load test is added.
- Scope: The change is small and focused across the BE RPC request construction, FE RPC handling/cache, thrift request fields, and one FE unit test.
- Concurrency: The new cached value follows the existing ConcurrentHashMap cache path. The first writer wins through computeIfAbsent, later callers receive the cached tablet locations and cached loadTabletIdx. I did not find a new lock-order or heavy-work-under-lock issue.
- Lifecycle: The added value has the same txn-scoped lifecycle as cached tablet locations and is cleared by the existing transaction finish/abort cleanup paths in local and cloud transaction managers.
- Configuration and compatibility: No config or storage format changes. The new thrift fields are optional with default false; older callers preserve previous behavior.
- Parallel paths: Both createPartition and replacePartition, plus both BE automatic create and overwrite replace request paths, are updated.
- Conditional/error handling: The loadTabletIdx is only attached for load_to_single_tablet on random distribution partitions, and UserException from the recorder is converted to RPC status like adjacent code.
- Transaction/data correctness: I did not find a visibility, commit, or atomicity regression; cache keying remains txn id plus partition id.
- Performance and observability: Added work is O(partitions) with one recorder lookup only for the targeted mode; existing debug logging is extended for the cached index.
- Tests: I attempted ./run-fe-ut.sh --run org.apache.doris.transaction.AutoPartitionCacheManagerTest, but this runner is missing thirdparty/installed/bin/protoc, so generated-source setup failed before the test could compile or run.
- User focus: review_focus.txt contains no additional focus points.
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 29502 ms |
TPC-H: Total hot run time: 29660 ms |
FE UT Coverage ReportIncrement line coverage |
TPC-DS: Total hot run time: 169075 ms |
TPC-DS: Total hot run time: 170573 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
| 6: optional bool write_single_replica = false | ||
| 7: optional Types.TUniqueId query_id | ||
| // Whether the caller's table sink is using load_to_single_tablet mode. | ||
| 8: optional bool load_to_single_tablet = false |
There was a problem hiding this comment.
why need this in result?
What problem does this PR solve?
load_tablet_idxis used by random distribution routing. For normal randomloads it can advance across batches, but for
load_to_single_tablet=trueitmust remain fixed for the partition in the sink so all rows are routed to one
tablet.
For normally planned partitions this is stable because FE builds the sink plan
once and all BE sink instances receive the same
load_tablet_idx.For runtime auto partitions, each BE can independently call
createPartition()or
replacePartition()for the same txn and partition. Before this patch, theauto partition cache only stored tablet locations and did not store
load_tablet_idx. FE also did not know whether the caller was usingload_to_single_tablet, so runtime auto partitions could lose the single-tabletrouting semantics or compute different tablet indexes across BE RPCs.
This patch passes
load_to_single_tabletin create/replace partition RPCs andstores the selected
load_tablet_idxinAutoPartitionCacheManager. Later BERPCs for the same txn and partition reuse the cached tablet index together with
the cached tablet locations. Ordinary random loads are unchanged.